30 research outputs found

    Decision Tree-based Syntactic Language Modeling

    Get PDF
    Statistical Language Modeling is an integral part of many natural language processing applications, such as Automatic Speech Recognition (ASR) and Machine Translation. N-gram language models dominate the field, despite having an extremely shallow view of language---a Markov chain of words. In this thesis, we develop and evaluate a joint language model that incorporates syntactic and lexical information in a effort to ``put language back into language modeling.'' Our main goal is to demonstrate that such a model is not only effective but can be made scalable and tractable. We utilize decision trees to tackle the problem of sparse parameter estimation which is exacerbated by the use of syntactic information jointly with word context. While decision trees have been previously applied to language modeling, there has been little analysis of factors affecting decision tree induction and probability estimation for language modeling. In this thesis, we analyze several aspects that affect decision tree-based language modeling, with an emphasis on syntactic language modeling. We then propose improvements to the decision tree induction algorithm based on our analysis, as well as the methods for constructing forest models---models consisting of multiple decision trees. Finally, we evaluate the impact of our syntactic language model on large scale Speech Recognition and Machine Translation tasks. In this thesis, we also address a number of engineering problems associated with the joint syntactic language model in order to make it tractable. Particularly, we propose a novel decoding algorithm that exploits the decision tree structure to eliminate unnecessary computation. We also propose and evaluate an approximation of our syntactic model by word n-grams---the approximation that makes it possible to incorporate our model directly into the CDEC Machine Translation decoder rather than using the model for rescoring hypotheses produced using an n-gram model

    Streaming Speech-to-Confusion Network Speech Recognition

    Full text link
    In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural ASR. In this paper, we present a novel streaming ASR architecture that outputs a confusion network while maintaining limited latency, as needed for interactive applications. We show that 1-best results of our model are on par with a comparable RNN-T system, while the richer hypothesis set allows second-pass rescoring to achieve 10-20\% lower word error rate on the LibriSpeech task. We also show that our model outperforms a strong RNN-T baseline on a far-field voice assistant task.Comment: Submitted to Interspeech 202

    PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

    Full text link
    End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places. Rare words often have non-trivial pronunciations, and in such cases, human knowledge in the form of a pronunciation lexicon can be useful. We propose a PROnunCiation-aware conTextual adaptER (PROCTER) that dynamically injects lexicon knowledge into an RNN-T model by adding a phonemic embedding along with a textual embedding. The experimental results show that the proposed PROCTER architecture outperforms the baseline RNN-T model by improving the word error rate (WER) by 44% and 57% when measured on personalized entities and personalized rare entities, respectively, while increasing the model size (number of trainable parameters) by only 1%. Furthermore, when evaluated in a zero-shot setting to recognize personalized device names, we observe 7% WER improvement with PROCTER, as compared to only 1% WER improvement with text-only contextual attentionComment: To appear in Proc. IEEE ICASS

    Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

    Full text link
    We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.Comment: Accepted to IEEE ASRU 2023. Internal Review Approved. Revised 2nd version with Andreas and Huck. The first version is in Sep 29th. 8 page

    Thermal Denaturation and Aggregation of Myosin Subfragment 1 Isoforms with Different Essential Light Chains

    Get PDF
    We compared thermally induced denaturation and aggregation of two isoforms of the isolated myosin head (myosin subfragment 1, S1) containing different ā€œessentialā€ (or ā€œalkaliā€) light chains, A1 or A2. We applied differential scanning calorimetry (DSC) to investigate the domain structure of these two S1 isoforms. For this purpose, a special calorimetric approach was developed to analyze the DSC profiles of irreversibly denaturing multidomain proteins. Using this approach, we revealed two calorimetric domains in the S1 molecule, the more thermostable domain denaturing in two steps. Comparing the DSC data with temperature dependences of intrinsic fluorescence parameters and S1 ATPase inactivation, we have identified these two calorimetric domains as motor domain and regulatory domain of the myosin head, the motor domain being more thermostable. Some difference between the two S1 isoforms was only revealed by DSC in thermal denaturation of the regulatory domain. We also applied dynamic light scattering (DLS) to analyze the aggregation of S1 isoforms induced by their thermal denaturation. We have found no appreciable difference between these S1 isoforms in their aggregation properties under ionic strength conditions close to those in the muscle fiber (in the presence of 100 mM KCl). Under these conditions kinetics of this process was independent of protein concentration, and the aggregation rate was limited by irreversible denaturation of the S1 motor domain

    Recovery of Empty Nodes in Parse Structures

    No full text
    In this paper, we describe a new algorithm for recovering WH-trace empty nodes. Our approach combines a set of hand-written patterns together with a probabilistic model. Because the patterns heavily utilize regular expressions, the pertinent tree structures are covered using a limited number of patterns. The probabilistic model is essentially a probabilistic context-free grammar (PCFG) approach with the patterns acting as the terminals in production rules. We evaluate the algorithmā€™s performance on gold trees and parser output using three different metrics. Our method compares favorably with state-of-the-art algorithms that recover WH-traces.
    corecore